Search results for " Sequence comparison"

showing 7 items of 7 documents

Efficient Algorithms for Sequence Analysis with Entropic Profiles

2017

Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign …

0301 basic medicineCompressed suffix arrayTheoretical computer scienceEntropySuffix tree0206 medical engineeringGeneralized suffix tree02 engineering and technologyString searching algorithmInformation theorylaw.invention03 medical and health scienceslawGeneticsAnimalsHumansMathematicsApplied MathematicsSuffix arrayComputational BiologyDNASequence Analysis DNAData structure030104 developmental biologySuffixAlignment free Entropy Sequence analysis Sequence comparisonAlgorithms020602 bioinformaticsBiotechnologyIEEE/ACM Transactions on Computational Biology and Bioinformatics

researchProduct

An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop

2016

Alignment-free methods are one of the mainstays of biological sequence comparison, i.e., the assessment of how similar two biological sequences are to each other, a fundamental and routine task in computational biology and bioinformatics. They have gained popularity since, even on standard desktop machines, they are faster than methods based on alignments. However, with the advent of Next-Generation Sequencing Technologies, datasets whose size, i.e., number of sequences and their total length, is a challenge to the execution of alignment-free methods on those standard machines are quite common. Here, we propose the first paradigm for the computation of k-mer-based alignment-free methods for…

0301 basic medicineTheoretical computer science030102 biochemistry & molecular biologySettore INF/01 - InformaticaComputer scienceComputationExtension (predicate logic)Information SystemHash tableDistributed computingTask (project management)Theoretical Computer Science03 medical and health sciences030104 developmental biologyAlignment-free sequence comparison and analysisHadoopHardware and Architecturealignment-free sequence comparison and analysis; distributed computing; Hadoop; MapReduce; software; theoretical computer science; information systems; hardware and architectureSequence comparisonMapReduceAlignment-free sequence comparison and analysiAlignment-free sequence comparison and analysis; Distributed computing; Hadoop; MapReduce; Theoretical Computer Science; Software; Information Systems; Hardware and ArchitectureSoftwareInformation Systems

researchProduct

Fast comparison of DNA sequences by oligonucleotide profiling

2008

Provisional abstact and full-text PDF files correspond to the article as it appeared upon acceptance. Fully formatted PDF and final abstract will be made available soon.

BioinformaticsFast speedADNOligonucleotide Profilinglcsh:MedicineGenomicsComputational biologyBiologyBioinformaticsGenomeGeneral Biochemistry Genetics and Molecular BiologyDNA sequencingConserved sequencechemistry.chemical_compoundTechnical NoteProfiling (information science)lcsh:Science (General)lcsh:QH301-705.5Medicine(all)OligonucleotideBiochemistry Genetics and Molecular Biology(all)lcsh:RGenomicsGeneral MedicineGenòmicaUVWORDchemistrylcsh:Biology (General)DNA sequence comparisonComputingMethodologies_DOCUMENTANDTEXTPROCESSINGDNAlcsh:Q1-390BMC Research Notes

researchProduct

Textual data compression in computational biology: Algorithmic techniques

2012

Abstract In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from…

Biological dataData Compression Theory and Practice Alignment-free sequence comparison Entropy Huffman coding Hidden Markov Models Kolmogorov complexity Lempel–Ziv compressors Minimum Description Length principle Pattern discovery in bioinformatics Reverse engineering of biological networks Sequence alignmentSettore INF/01 - InformaticaGeneral Computer ScienceKolmogorov complexityComputer scienceSearch engine indexingComputational biologyInformation theoryInformation scienceTheoretical Computer ScienceTechnical PresentationEntropy (information theory)Data compressionComputer Science Review

researchProduct

An extension of the Burrows-Wheeler Transform

2007

AbstractWe describe and highlight a generalization of the Burrows–Wheeler Transform (bwt) to a multiset of words. The extended transformation, denoted by ebwt, is reversible. Moreover, it allows to define a bijection between the words over a finite alphabet A and the finite multisets of conjugacy classes of primitive words in A∗. Besides its mathematical interest, the extended transform can be useful for applications in the context of string processing. In the last part of this paper we illustrate one such application, providing a similarity measure between sequences based on ebwt.

Discrete mathematicsMultisetSimilarity (geometry)General Computer ScienceBurrows–Wheeler transformGeneralizationAlignment-free distance measure; Burrows-Wheeler transform; Sequence comparisonContext (language use)Similarity measureBurrows-Wheeler transformSequence comparisonTheoretical Computer ScienceConjugacy classBijectionAlignment-free distance measureBurrows–Wheeler transformComputer Science::Formal Languages and Automata TheoryComputer Science(all)Mathematics

researchProduct

A New Combinatorial Approach to Sequence Comparison

2008

In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output…

MultisetTheoretical computer scienceBurrows–Wheeler transformSettore INF/01 - InformaticaComputer scienceBurrows-Wheeler transform; Sequence comparisonString (computer science)Context (language use)Extension (predicate logic)ComparisonInformation theoryGenomeBurrows-Wheeler transform; ComparisonTheoretical Computer ScienceTransformation (function)CategorizationComputational Theory and MathematicsPhylogeneticsSequence comparisonTheory of computationBurrows-Wheeler TransformSequence ComparisonAlgorithmMathematicsData compression

researchProduct

Alignment-Free Sequence Comparison over Hadoop for Computational Biology

2015

Sequence comparison i.e., The assessment of how similar two biological sequences are to each other, is a fundamental and routine task in Computational Biology and Bioinformatics. Classically, alignment methods are the de facto standard for such an assessment. In fact, considerable research efforts for the development of efficient algorithms, both on classic and parallel architectures, has been carried out in the past 50 years. Due to the growing amount of sequence data being produced, a new class of methods has emerged: Alignment-free methods. Research in this ares has become very intense in the past few years, stimulated by the advent of Next Generation Sequencing technologies, since those…

SpeedupTheoretical computer scienceSettore INF/01 - InformaticaComputer scienceAlignment-free sequence comparison and analysis; Distributed computing; Hadoop; MapReduce; Software; Mathematics (all); Hardware and ArchitectureSequence alignmentContext (language use)Computational biologyDNA sequencingDistributed computingTask (project management)Alignment-free sequence comparison and analysisHadoopHardware and ArchitectureMathematics (all)Relevance (information retrieval)MapReducePattern matchingAlignment-free sequence comparison and analysiSoftware

researchProduct